Entity and numeric character references

Valid HTML entity references and numeric character references can be used in place of the corresponding Unicode character, with the following exceptions:

  • Entity and character references are not recognized in code blocks and code spans.

  • Entity and character references cannot stand in place of special characters that define structural elements in CommonMark. For example, although * can be used in place of a literal * character, * cannot replace * in emphasis delimiters, bullet list markers, or thematic breaks.

Conforming CommonMark parsers need not store information about whether a particular character was represented in the source using a Unicode character or an entity reference.

Entity references consist of & + any of the valid HTML5 entity names + ;. The document https://html.spec.whatwg.org/multipage/entities.json is used as an authoritative source for the valid entity references and their corresponding code points.

Example 321

Markdown HTML Demo
  & © Æ Ď
¾ ℋ ⅆ
∲ ≧̸

<p>  &amp; © Æ Ď
¾ ℋ ⅆ
∲ ≧̸</p>

Decimal numeric character consist of &# + a string of 1–7 arabic digits + ;. A numeric character reference is parsed as the corresponding Unicode character. Invalid Unicode code points will be replaced by the REPLACEMENT CHARACTER (U+FFFD). For security reasons, the code point U+0000 will also be replaced by U+FFFD.

Example 322

Markdown HTML Demo
&#35; &#1234; &#992; &#0;

<p># Ӓ Ϡ �</p>

Hexadecimal numeric character consist of &# + either X or x + a string of 1-6 hexadecimal digits + ;. They too are parsed as the corresponding Unicode character (this time specified with a hexadecimal numeral instead of decimal).

Example 323

Markdown HTML Demo
&#X22; &#XD06; &#xcab;

<p>&quot; ആ ಫ</p>

Here are some nonentities:

Example 324

Markdown HTML Demo
&nbsp &x; &#; &#x;
&#87654321;
&#abcdef0;
&ThisIsNotDefined; &hi?;

<p>&amp;nbsp &amp;x; &amp;#; &amp;#x;
&amp;#87654321;
&amp;#abcdef0;
&amp;ThisIsNotDefined; &amp;hi?;</p>

Although HTML5 does accept some entity references without a trailing semicolon (such as &copy), these are not recognized here, because it makes the grammar too ambiguous:

Example 325

Markdown HTML Demo
&copy

<p>&amp;copy</p>

Strings that are not on the list of HTML5 named entities are not recognized as entity references either:

Example 326

Markdown HTML Demo
&MadeUpEntity;

<p>&amp;MadeUpEntity;</p>

Entity and numeric character references are recognized in any context besides code spans or code blocks, including URLs, link titles, and fenced code block info strings:

Example 327

Markdown HTML Demo
<a href="&ouml;&ouml;.html">

<a href="&ouml;&ouml;.html">

Example 328

Markdown HTML Demo
[foo](/f&ouml;&ouml; "f&ouml;&ouml;")

<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>

Example 329

Markdown HTML Demo
[foo]

[foo]: /f&ouml;&ouml; "f&ouml;&ouml;"

<p><a href="/f%C3%B6%C3%B6" title="föö">foo</a></p>

Example 330

Markdown HTML Demo
``` f&ouml;&ouml;
foo
```

<pre><code class="language-föö">foo
</code></pre>

Entity and numeric character references are treated as literal text in code spans and code blocks:

Example 331

Markdown HTML Demo
`f&ouml;&ouml;`

<p><code>f&amp;ouml;&amp;ouml;</code></p>

Example 332

Markdown HTML Demo
    f&ouml;f&ouml;

<pre><code>f&amp;ouml;f&amp;ouml;
</code></pre>

=

Entity and numeric character references cannot be used in place of symbols indicating structure in CommonMark documents.

Example 333

Markdown HTML Demo
&#42;foo&#42;
*foo*

<p>*foo*
<em>foo</em></p>

Example 334

Markdown HTML Demo
&#42; foo

* foo

<p>* foo</p>
<ul>
<li>foo</li>
</ul>

Example 335

Markdown HTML Demo
foo&#10;&#10;bar

<p>foo

bar</p>

Example 336

Markdown HTML Demo
&#9;foo

<p>→foo</p>

Example 337

Markdown HTML Demo
[a](url &quot;tit&quot;)

<p>[a](url &quot;tit&quot;)</p>